Failure Detectors as First Class Objects

نویسندگان

  • Pascal Felber
  • Xavier Défago
  • Rachid Guerraoui
  • Philipp Oser
چکیده

One of the fundamental differences between a centralized system and a distributed one is the notion of partial failures. The ability to efficiently and accurately detect failures is a key element underlying reliable distributed computing. In current distributed systems however, failure detection is either left to the application developer or hidden from the programmer and provided in an ad hoc manner behind the scene. We plead for an intermediate approach where failure detectors are first class objects. We view failure detection as an abstraction, the complexity of which is encapsulated behind well defined interfaces. The various roles of a failure detection service are all represented as first class objects. Following our approach, one can reuse existing failure detection protocols as they are or, through composition or refinement, define new protocols that match the application requirements. We describe an interesting result of a composition that mixes push and pull failure monitoring and we show how scalability issues may be addressed by using a hierarchical failure detection configuration. We also discuss the implementation of our failure service both in CORBA and in Java.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computer Science and Artificial Intelligence Laboratory Impossibility of Boosting Distributed Service Resilience

We prove two theorems saying that no distributed system in which processes coordinate using reliable registers and f -resilient services can solve the consensus problem in the presence of f + 1 undetectable process stopping failures. (A service is f -resilient if it is guaranteed to operate as long as no more than f of the processes connected to it fail.) Our first theorem assumes that the give...

متن کامل

Contours Extraction Using Line Detection and Zernike Moment

Most of the contour detection methods suffers from some drawbacks such as noise, occlusion of objects, shifting, scaling and rotation of objects in image which they suppress the recognition accuracy. To solve the problem, this paper utilizes Zernike Moment (ZM) and Pseudo Zernike Moment (PZM) to extract object contour features in all situations such as rotation, scaling and shifting of object i...

متن کامل

(anti−Ω × Σz)-based k-set Agreement Algorithms

This paper considers the k-set agreement problem in a crash-prone asynchronous message passing system enriched with failure detectors. Two classes of failure detectors have been previously identified as necessary to solve asynchronous k-set agreement: the class anti-leader anti−Ω and the weak-quorum class Σk. The paper investigates the families of failure detector (anti−Ωx)1≤x≤n and (Σz)1≤z≤n. ...

متن کامل

Implementing the Weakest Failure Detector for Solving Consensus

The concept of unreliable failure detector was introduced by Chandra and Toueg as a mechanism that provides information about process failures. This mechanism has been used to solve several agreement problems, like Consensus. In this paper, algorithms that implement failure detectors in partially synchronous systems are presented. First two simple algorithms of the weakest class to solve Consen...

متن کامل

On the Impossibility of Boosting Distributed Service Resilience∗

We show that no deterministic algorithm can solve consensus in the presence of t+1 process crash failures, in a system of n processes that communicate in a reliable way and synchronize their activities using any number of t-resilient services. These base services can range from any type of atomic objects shared by the processes (including consensus objects), to any class of non-atomic objects l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999